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Abstract 

Human facial emotion recognition is a difficult task in computer-human inter- 
action. Facial emotion recognition is required in many applications like med- 
ical, security, video games, e-physiotherapy, and counselling. Literature has 
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K rds: : : : : 

oe many studies that have focused only on 6 basic emotions but advanced studies 
Compound Emotion; 5; eke 4 : 
FACS: suggest human emotions are not limited to these 6 basic emotions. A human 
Acres face can exhibit many other emotions, which are generated by combining the 
Openface; two basic emotions, these derived emotions are known as compound emotions. 
SVM; Recognition of compound emotions is also a very important task; hence this 
my study proposes the use of the Facial Action Coding System (FACS) to identify 


12 compound emotions. The authors identified and derived the intensities of 
17 AUs with Openface library. Finally, two machine learning classifiers SVM 
(Support Vector Machine) and KNN (K-nearest neighbour) were implemented 
to identify 12 compound emotions, and results were compared. The experimen- 
tal results show that the SVM classifier outperformed with an emotion recog- 
nition rate of 98.31% while the recognition rate of K-NN was 93.66%. The 
authors also implemented SHAP values to observe the AUs association with 
each compound emotion. 


1. Introduction 


The human face is considered to be the mirror 
of emotions. Human face and facial expressions 
are the most powerful way to convey an emo- 
tional state (Ekman and Rosenberg). The move- 
ment of facial features shown on the human face is 
known as facial expression, these facial expressions 
are used to define human emotions (Swaminathan, 
Vadivel, and Arock). Most of the previous stud- 
ies of facial expression recognition and emotion 
detection were focused only on 7 basic emotions 
(Happy, sad, anger, disgust, fear, surprise, and neu- 
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tral). However, apart from these 7 basic emotions, 
recent studies have defined 21 other compound emo- 
tions (Du, Tao, and Martinez). Compound emotions 
can be generated by combining any 2 basic emo- 
tions, for example, happy and surprised emotions 
can be combined to generate a happily surprised 
compound emotion. The images of 12 compound 
emotions identified in this research are shown in fig- 
urel (Du, Tao, and Martinez). Facial action coding 
system analysis (FACS) depicts that the production 
of these 12 compound emotions is different from 
basic emotions but can be generated using basic 
emotions. The famous psychologists Ekman and 
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Friesen (Ekman and Friesen) proposed the FACS in 
1978, after studying the anatomy of the face and 
the classification of facial expressions. Since then, 
FACS is a leading standard in understanding facial 
behavioral research. The use of FACS is not lim- 
ited to behavioral science research but it is widely 
applied in computer analysis of the face (Ekman and 
Rosenberg). FACS helps in identifying and scoring 
the Action Units (AUs), these AUs exhibit the mus- 
cular activity that produces momentary changes in 
the facial appearance. 
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Sadly fearful Sadly disgusted Sadly angry Disgustedly surprise 


FIGURE 1. Images of 12 Compound Emotions 


Facial action coding system analysis (FACS) 
depicts that the production of these 12 compound 
emotions is different from basic emotions but can 
be generated using basic emotions. The famous psy- 
chologists Ekman and Friesen (Ekman and Friesen) 
proposed the FACS in 1978, after studying the 
anatomy of the face and the classification of facial 
expressions. Since then, FACS is a leading standard 
in understanding facial behavioral research. The 
use of FACS is not limited to behavioral science 
research but it is widely applied in computer anal- 
ysis of the face (Ekman and Rosenberg). FACS 
helps in identifying and scoring the Action Units 
(AUs), these AUs exhibit the muscular activity that 
produces momentary changes in the facial appear- 
ance. FACS divides the human face into 46 AUs, 
each AU is represented by a name and a number, 
for example, AU4 is Brow Lowerer. Individual AUs 
could be unable to express distinct semantic facial 
expressions, but the combination of them does. This 
method has served as a crucial psychological foun- 
dation in the field of facial expression recognition 
(FER) (Tan et al.). This research suggests a tech- 
nique to recognise 12 novel compound emotions 
based on facial action unit detection in order to bet- 
ter comprehend human emotions. Further, SHAP 
values were also used to show the contribution of 
AUs in defining a compound emotion. 
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2. Related Work 


A study (Zhu, Li, and Wu) is applied to iden- 
tify 7 basic expressions on the Yale dataset and 8 
expressions on the JAFFE dataset by implementing 
Equable Principal Component Analysis which is a 
depiction of emotional features and Linear Regres- 
sion Classification (LRC). The approach of PCA 
proposed for mining the facial feature has a sig- 
nificant capacity for enhancing the accuracy and 
generalisation performance of the feature vector. 
The LRC method’s classifier is particularly efficient 
in the recognition of the other expressions. The 
accuracy was 89.1% and 91.1% on the Yale and 
JAFFE database. Another work employed a cas- 
cade regression tree to extract facial features from 
the CK+ dataset, authors compared the results using 
logistic regression, SVM, and, NN, to present a 
Facial Emotion Recognition (FER) system to iden- 
tify six facial emotions. (Bilkhu, S. Gupta, and Sri- 
vastava). Another study used both kernel-based 
PCA and PCA methods on 3D face images to iden- 
tify 4 facial expressions. On the Imperial College 
London dataset, the K-NN classifier is used to iden- 
tify facial expressions; kernel PCA surpassed PCA 
with 77.29% accuracy whereas PCA only managed 
52.69% (Peter, Minoi, and Hipiny). A combination 
of Local Tetra Pattern and Local Directional Num- 
ber Pattern was implemented to identify 3 expres- 
sions (disgust, smile, sad, and surprise) on JAFFE 
dataset with an accuracy rate of 90%. The asso- 
ciation between the indicated pixel and its neigh- 
bours is encrypted using the suggested approach. 
For numerous patterns coding from the texture of 
a face, LDPN implements stable directional infor- 
mation against noise over intensity. The local tex- 
ture’s spatial frame is described by the LTrP method 
using the centre pixel direction. According to the 
pixel direction, determined by the vertical and hori- 
zontal derivatives, the LTrP technique encrypts the 
image (Emmanuel and Revina). Using orthogo- 
nal planes and low-dimensional feature space, the 
extraction of a pyramid depiction of uniform Tem- 
poral Local Binary Pattern (PTLBPu2) was sug- 
gested in (Abdallah, Guermazi, and Hammami) as 
a dynamic technique based on facial features extrac- 
tion from expressions using videos. The most dis- 
criminating sub-regions are then chosen using the 
suggested procedure. By using the PCA approach, 
the feature space that is focused on low-dimensional 
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feature space is 

reduced. SVM classifier and the C4.5 algorithm 
are used to classify face expressions. The popu- 
lar facial expression databases MMI and CK+ were 
used for the experiments. The experimental find- 
ing demonstrates that in an uncontrolled situation, 
92% of recognitions were accurate. Another work 
in FER used the Histogram of Oriented Gradients 
(HOG) for feature extraction and the Viola-Jones 
algorithm for face detection. To reduce dimension- 
ality and extract the most 

recognition rate of 93.53% with the SVM, 
82.97% with the MLP, and 79.97% with K-NN 
classifier (Dino and Abdulrazzaq). In the ref- 
erence (Tarnowski et al.) 7 basic emotions were 
analyzed Coefficients describing elements of facial 
expression were used as features. K-NN and MLP 
neural networks were applied with 96% and 90% 
accuracies respectively. A summary of related work 
is given in table 2. 


3. Proposed Methodology 


Datasets: The Extended Cohn-Kanade (CK+) 
dataset is employed in this research, CK+ is widely 
used for classifying facial expressions under labo- 
ratory control, it contains 593 video clips from 123 
unique subjects who are between the ages of 18 and 
50, as well as representing different genders and eth- 
nicities. A change in facial expression from neu- 
tral to a definite peak expression is represented in 
each video. It was shot at 30 frames per second. 
Anger, disdain, disgust, fear, happiness, sorrow,and 
surprise are the seven expression classes that have 
been assigned to 327 of these clips. To discover the 
relation between the AUs and emotions, we use a 
portion (500 sequences) of the CK+ collection that 
has been classified for AU and emotion. We use 
the leftover databases CK+ (pictures not used for 
learning) and real-world videos to validate the per- 
formance. For this work, we collected videos from 
11 subjects. The videos were gathered to track an 
experiment that categorizes a subject’s emotions. 

Action Unit Extraction 

By evaluating facial expressions with FACS, 
authors used Openface library to classify the ana- 
lyzed AUs; Openface detects the faces in a 
rectangle-shaped space, and as a result, noise such 
as background colour is present when faces are rec- 
ognized (Baltrusaitis et al.). With the help of dlib, 
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68 different facial landmarks can be located. This 
will allow the removal of any noise from a facial 
image. OpenFace is an open-source library, which 
incorporates facial landmark identification, emotion 
recognition, head posture, and eye gaze estimation, 
it pre-processes SAMM Long Videos (C. H. Yap, 
Kendrick, and M. H. Yap). In this work, authors 
solely pay attention to face alignment and AU detec- 
tion. It uses an affine transformation. Dlib’s 
face landmark detector picks up the facial land- 
marks. (Baltrusaitis et al.). Landmarks of currently 
identified faces are compared with a neutral appear- 
ance using a similarity transform. Deep networks 
are used by OpenFace’s Convolutional Experts Con- 
strained Local Model (CE-CLM) to identify and 
track facial landmark characteristics (T. Pravin et 
al.). From the original version of the deep network, 
which had 180,000 parameters, to roughly 90,000 
parameters. With little accuracy loss, this lessens 
the complexity of the model and speeds it up by 
1.5 times. Additionally, CE-CLM employs 11 ini- 
tialization hypotheses at various orientations, which 
results in a 4-fold gain in performance. Addition- 
ally, it makes use of sparse response maps, which 
increases model performance by 1.5 times com- 
pared to the reference frame and eliminates changes 
brought on by scaling (V. S. R. T. Pravin and Thirup- 
pathi). The final output is 112 by 112 pixels in size 
and has an interpupillary distance of 45 pixels. The 
complete framework of Openface is shown in figure 


FIGURE 2. Framework of Openface 2.0 


While the occurrence of 18 AU is reported as a 
binary value (0 absent, 1 present), AUs intensity 
levels are categorized in 6 levels; AU not present- 
O, Mild-A, Slight-B, Moderate-C, Severe-D, and 
Extremely severe-E. Further, intensities of 17 AUs 
were presented as a regression output from 0 to 100. 
We simply concentrated on the presence of 17 AUs 
for the construction of our model (excluding AU45). 
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TABLE 1. Summarized Related Work 


Authors #Emotions Dataset Accuracy Classifier Drawback 

Identified 
Emmanuel and = Revina 4 JAFEE 90% SVM Identified emotions are less 
(2015) and less accuracy 
Zhu et al., (2016) (Zhu, Li, 7 Yale and 91.1% LRC pollution problem in face 
and Wu) JAFFE recognition 
Tarnowski et al., 7 KDEF 96% KNN No compound emotions 
(2017) (Tarnowski et 90% MLP were analyzed 
al.) 
Abdallah et al., 6 CK+ 92% SVM No compound emotions 
(2018) (Abdallah, Guer- were analyzed 
mazi, and Hammami) 
Bikhu et al., (2019) (Bilkhu, 6 CK+ 89% SVM Less effective recognition 
S. Gupta, and Srivastava) performance 
Hivi and Maiwan 7 CK+ 93.33% SVM No compound emotions 
(2019) (Dino and Abdulraz- 82.57% NN were analyzed 
zaq) 79.97% KNN 
Peter et al., (2019) (Peter, 4 Imperial 77.29% KNN Identified emotions are less 
Minoi, and Hipiny) College and less accuracy 

London 


Frames with intensity levels of at least 2 were desig- 
nated as positive examples, whereas the remaining 
frames were designated as negative examples. 

Classifiers 

Two machine learning algorithms K-NN and 
SVM were employed to handle this multi-class clas- 
sification problem. These machine-learning models 
are very efficient and require a small dataset. The K- 
nearest neighbour classifier finds the pattern space 
for the k-training tuples that seem to be the closest 
to the unknown tuple. The unknown tuple’s k ’near- 
est neighbours” are the k training tuples. 

A machine learning model is used by SVM, 
a classification and regression prediction tool, to 
enhance predicted accuracy while automatically 
avoiding over-fitting to the data. Researchers have 
employed a variety of techniques to address the 
issue of multiclass categorization, including one vs. 
one and one vs. The Multiclass Support Vector 
Machine, which is an extension of the linear Sup- 
port Vector Machine, and is implemented based on 
the one-to-rest or OVA classification. 


4. Results 


Coded in Python 3, this experiment was trained and 
modeled using a Windows 10 setup with an Intel 
Core 15-8250U processor clocked at 2.30 GHz and 


16 GB of RAM. Utilizing Python 3 and the sklearn 
package, SVM and K-NN classifiers were used. The 
10-fold Cross-validation approach is employed in 
the experiment to evaluate the model. To prevent 
any over-fitting issues, K-fold cross-validation is 
used. There is not much of a difference between 
the test accuracies using 5 cross-validation and 10 
cross-validation due to the small size of the dataset. 
As seen in table 2, although authors tested several 
values of k, there was no significant variation. In 
general, 10-fold cross-validation provides the high- 
est level of accuracy. 


TABLE 2. K Fold Cross Validation 


# K-foldCV_ Test accuracy (SVM) 
je 96.47 
2 K=5 97.03 
3 Ke7 97.80 
4 K=10 98.31 


The dataset is partitioned into ten subsets with 
a ten-cross validation. Nine subsets were used for 
training and one for testing, computing the aver- 
age of the output outcomes over the ten evalu- 
ations (Priya et al. P. Gupta, Mayji, and Mehra 
Lawrence, Campbell, and Skuse). As a result of 
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giving a model the chance to train on various train- 
test splits, it is typically the preferable strategy. 
Results for both the classifier are shown in table 3 
and table 4. The true negative rate which is also 
known as specificity is the highest (1) in happily sur- 
prised and sadly surprised emotions using SVM, and 
also in fearfully disgusted and sadly fearful emo- 
tions using K-NN. The lowest value of specificity 
(0.89) is in sadly disgusted emotion using K-NN. 
Another attribute sensitivity means the true positive 
rate of the measured emotion. Its peak value (1) is 
in fearfully angry emotion for the SVM and for the 
emotion of fearfully surprised for K-NN. Sensitivity 
lowest value (0.86) is in disgustedly surprised emo- 
tion for the K-NN. The value of False Positive (FP) 
1.e., incorrectly classified values is (0) when speci- 
ficity is 1. Finally, the F-Measure is the harmonic 
mean of sensitivity and precision. The F-Measure 
maximum value (0.99) is in fearfully surprised emo- 
tion for the SVM. The overall classification accu- 
racy is 98.31% using the SVM and 93.66% using 
K-NN. Table 5 represents the comparison of the pro- 
posed model with state-of-art in terms of accuracy. 


AUs Observed in Each Basic and Compound 
Emotion 


A model is trained using a dataset in machine 
learning techniques, and the model then makes pre- 
dictions. However, it is impossible to anticipate how 
crucial certain AUs will be in making predictions (P. 
Gupta, Maji, and Mehra Lawrence, Campbell, and 
Skuse Nadeeshani, Jayaweera, and Samarasinghe). 
Its complex to interpret a model’s working based on 
the predicted outcomes, hence SHAP value method 
can be implemented which can show the contribu- 
tion of each AU to the target emotion. As shown in 
figure 3: AU importance plot for happily surprised 
emotion AUI, AU2, AUS, AU12, AU25, AU26 are 
the most important AUs in predicting happily sur- 
prised emotion. Similarly, contribution of AU4, 
AU6, AU9, AU10, and AU17 was observed in pre- 
dicting sadly disgusted emotion as shown in figure 
4. Table 6 represents the summary of AUs Associa- 
tion with each Compound Emotion. 


5. Conclusion 


Compound emotions, which are generated by com- 
bining two or more fundamental emotion categories, 
such as happily surprised, happily disgusted, and 
sadly surprised were presented in the current study 
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FIGURE 3. SHAP AUs Importance Plot for Hap- 
pilySurprised Emotion 
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FIGURE 4. SHAP AUs Importance Plot for 
Sadly Disgusted Emotion 


as an essential type of emotion category. Six funda- 
mental and twelve more complex facial manifesta- 
tions of emotion made up 18 categories of emotion. 
Last but not least, this identification of Face Action 
Units and their intensity was used to reasonably map 
them to the associated basic and compound facial 
emotions. The technique described in this research 
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Combined Emotion 
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TABLE 3. Results with SVM 


Specificity Sensitivity FPrate Precision F-measure 


SVM Accuracy = 98.31 


Happily surprised 
Happily disgusted 
Fearfully surprised 
Fearfully disgusted 
Fearfully angry 
Sadly surprised 
Sadly fearful 
Sadly disgusted 
Sadly angry 
Disgustedly surprised 
Angrily surprised 
Angrily disgusted 


1.00 
0.99 
0.99 
0.98 
0.99 
1.00 
0.97 
0.98 
0.98 
0.99 
0.97 
0.99 


0.99 
0.93 
0.94 
0.97 
1.0 

0.92 
0.94 
0.98 
0.89 
0.94 
0.98 
0.98 


0.00 1.0 0.97 
0.01 0.98 0.96 
0.01 0.98 0.99 
0.02 0.97 0.95 
0.01 0.99 0.95 
0.00 1.0 0.91 
0.03 0.96 0.93 
0.02 0.98 0.95 
0.02 0.97 0.91 
0.01 0.98 0.95 
0.03 0.97 0.94 
0.01 0.98 0:97 


TABLE 4. Results with K-NN 


K-NN Accu- Specificity Sensitivity FP rate Precision F-measure 

racy=93.66 

Combined Emotion 

Happily surprised 0.91 0.92 0.09 1.0 0.98 
Happily disgusted 0.99 0.98 0.01 0.98 0.94 
Fearfully surprised 0.99 1.0 0.01 0.99 0.97 
Fearfully disgusted 1.0 0.99 0.0 1.0 0.98 
Fearfully angry 0.98 0.98 0.02 0.98 0.93 
Sadly surprised 0.99 0.99 0.01 0.99 0.91 
Sadly fearful 1.0 0.99 0.0 1.0 0.84 
Sadly disgusted 0.89 0.90 0.11 0.89 0.85 
Sadly angry 0.90 0.92 0.10 0.90 0.89 
Disgustedly surprised 0.93 0.86 0.07 0.92 0.80 
Angrily surprised 0.91 0.91 0.09 0.91 0.89 
Angrily disgusted 0.94 0.94 0.06 0.94 0.93 


TABLE 5. The Comparison of the Accuracy of the Proposed Model with State-of-Art 


Literature Dataset Classifier Accuracy 

Abdallah et al., (2018) (Abdallah, Guer- CK+ SVM 92% 

mazi, and Hammami) 

Hivi and Maiwan., (2019) (Dino and CK+ SVM NN KNN 93.33% 82.57% 79.97% 
Abdulrazzaq) 

Nadeeshani et al, (Nadeeshani, CK+,KDEF KNNCNN 80% 86.66% 
Jayaweera, and Samarasinghe) 

Lawrence et al., (Lawrence, Campbell, CK, JAFFE, SVM KNN 65.35% 70.87% 

and Skuse) PSL 

Proposed model CK+ KNN SVM 93.66% 98.31% 


can recognize all 18 facial expressions. The classi- 
fiers are trained and tested with 10-fold validation. 
Facial expressions, SVM, and K-NN classification 


were all done using the three classifiers. The exper- 
iment’s findings demonstrate that SVM is a supe- 
rior classifier, with a 98.31% correct classification 
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TABLE 6. Summary of AUs Association with each Compound Emotion 


Combined Emotion AUs observed Description 

Happy 6, 12, 25 Cheek raiser, Lip corner puller, Lips part 

Sad 1,4, 15, 17 Inner brow raiser, Brow lowerer, Lip corner depres- 
sor, Chin raiser 

Fearful 1, 4, 20, 25 Inner brow raiser, Brow lowerer, Lip Stretcher, 
Lips part 

Angry 4,7, 17 Brow lowerer, Lid tightener, Chin raiser 

Surprised 1, 2,5, 25, 26 Inner brow raiser, Outer brow raiser, Upper Lid 
Raiser, Lips part, Jaw drop 

Disgusted 4,9, 10, 17 Brow lowerer, Nose wrinkle, Upper lip raiser, Chin 
raiser 

Happily surprised 1, 2,5, 12, 25, 26 Inner brow raiser, Outer brow raiser, Upper Lid 
raiser, Lip corner puller, Lips part, Jaw drop 

Happily disgusted 6, 9, 10, 12, 25 Cheek raiser, Nose wrinkle, Upper lip raiser, Lip 
corner puller, Lips part 

Fearfully surprised 1, 2, 5, 20, 25 Inner brow raiser, Outer brow raiser, Upper Lid 
Raiser, Lip Stretcher, Lips part 

Fearfully disgusted 1, 4, 9, 10,20, 25 Inner brow raiser, Brow lowerer, Nose wrinkle, 
Upper lip raiser, Lip Stretcher, Lips part 

Fearfully angry 4,5, 7, 20, 25 Brow lowerer, Upper Lid Raiser, Lid tightener, Lip 
Stretcher, Lips part 

Sadly surprised 1, 4, 25, 26 Inner brow raiser, Brow lowerer, Lips part, Jaw 
drop 

Sadly fearful 1, 4, 20, 25 Inner brow raiser, Brow lowerer, Lip Stretcher, Jaw 
drop 

Sadly disgusted 4,6, 9, 10, 17 Brow lowerer, Cheek raiser, Nose wrinkle, Upper 
lip raiser, Chin raiser 

Sadly angry 4,7, 15,17 Brow lowerer, Lid tightener, Lip corner depressor, 
Chin raiser 

Disgustedly surprised 1, 2,5, 10, 17 Inner brow raiser, Outer brow raiser, Upper Lid 
Raiser, Upper lip raiser, Chin raiser 

Angrily surprised 4,7, 25, 26 Brow lowerer, Lid tightener, Lips part, Jaw drop 

Angrily disgusted 4,7, 10,17 Brow lowerer, Lid tightener, Upper lip raiser, 


rate. Further, the SHAP value method was used to 
identify major contributing Action Units for each of 
the 18 emotions. To improve accuracy, we can test 
the suggested strategy with various machine learn- 
ing algorithms in the future using our unique dataset 
that we are currently building. 
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